Task i
Task ii
Task iii
Explanation
Explanation
Explanation
Explanation
Explanation
Task i
Task ii
Task iii
Task iv
Task v
Task vi
Heat Map Average Price per City
Explanation
Explanation
---
title: "FlexDashBoard"
author: "Jerome Agius & Matthias Bartolo"
date: "12/16/2021"
output:
flexdashboard::flex_dashboard:
orientation: rows
social: menu
source_code: embed
vertical_layout: scroll
---
```{r setup, include=FALSE}
library(ggplot2)
library(flexdashboard)
# Make some noisily increasing data
set.seed(955)
dat <- data.frame(cond = rep(c("A", "B"), each=10),
xvar = 1:20 + rnorm(20,sd=3),
yvar = 1:20 + rnorm(20,sd=3))
```
Explanation Task 1 i, ii, iii
=======================================================================
Row {data-height=450}
-----------------------------------------------------------------------
### Chart 1
```{r}
valueBox("Task i", caption = "
The data features found in the .csv file include house_price, rental_agency, city, bedrooms and surface.
Observations:
1. The data found in the house_price column consisted of continuous integer variables which were of ratio type. This was because 0 implies the absence of money.
2. The data found in the rental_agency column consisted of text which is a categorical variable. This data is also of the nominal type as it has no inbuilt order.
3. The data found in the city column consisted of text which is a categorical variable. This data is also of the nominal type as it has no inbuilt order.
4. The data found in the bedrooms column consited of discrete integer variables (as one can only have a whole number of rooms example 3, one can't have 3.5 rooms for instance). This data is also of the ratio type as 0 implies the absence of a bedroom.
5. The data found in the surface column was of the type continuous integer variable. It also was of ratio type as 0 implies the absence of surface area
", col="Blue")
```
Row {data-height=450}
-----------------------------------------------------------------------
### Chart 2
```{r}
valueBox("Task ii", caption = "
1. To be able to determine the relationship between the different variables one could output a scatter plot and determine the trend (best fit line).
2. One must remove any instances of duplicate data, this can be done by using the function duplicated() which will retrieve all duplicate rows or unique() which will retrieve all unique rows instead.
example:
- data <- read.csv( \"csv file path\" )
- No_Duplicates <- unique(data)
which will store all the unique rows in the variable data thus removing all instances apart from 1 of the duplicated data.
3. One can also work out the central tendency, dispersion and standard deviation where applicable to better understand how the data varies.
", col="green")
```
Row {data-height=450}
-----------------------------------------------------------------------
### Chart 3
```{r}
valueBox("Task iii", caption = "In this section we aimed to find and understand the distributions in relation to the attributes of the data set.
We accomplished this by first finding the mean where applicable and then we plotted a histogram of each attribute of the data set, via the ggqqplot function.
We also compared the values in relation to a normal distribution.
With the information available to us, we thus concluded that the house_price and surface area had a normal distribution, the bedrooms had a poisson distribution, whilst the rental_agency and city seemed randomly distributed.
In light of this it doesn't make sense to use the same distribution for all the data, as the bedrooms column has a different distribution than the house_price and surface area columns.
The histograms and QQ-plots used in this section can be viewed in the Task 1 Plots section", col="blue")
```
Prices {data-navmenu="Task 1 Plots"}
=======================================================================
Row
-----------------------------------------------------------------------
### Text
```{r}
valueBox("Explanation", caption = "The histogram shows the frequency of the prices across the country. Whilst the Q-Q plot shows how close the histogram is to having a normal distribution. We concluded that this graph was of a normal distribution if the heavy tails were excluded.", col="Blue")
```
Row {data-height=650}
-----------------------------------------------------------------------
### House Price histogram
```{r}
library(tidyverse)
library(ggpubr)
dataInitial <- read.csv("C:/Users/Jerome/Documents/GroupProject/Clean.csv")
dataInitial$X <- NULL
hist(dataInitial$house_price,xlim=c(0,17500), main="House Price histogram", col="blue",breaks=100, xlab="House Prices")
```
### House Price Quantile-Quantile plot
```{r}
ggqqplot(dataInitial$house_price, title="house_price Quantile-Quantile plot", col="blue")+theme(plot.title = element_text(hjust = 0.5))
```
Agencies {data-navmenu="Task 1 Plots"}
=======================================================================
Row
-----------------------------------------------------------------------
### Text
```{r}
valueBox("Explanation", caption = "The histogram shows the frequency of each agency (each agency is represented by a number) across the country. Whilst the Q-Q plot shows how close the histogram is to having a normal distribution. We concluded that this graph was of a random distribution as it doesn't have a consistent form.", col="pink")
```
Row {data-height=650}
-----------------------------------------------------------------------
### Agency histogram
```{r}
agency <- c(dataInitial$rental_agency)
agency.factor <- factor(agency)
Agency <- as.numeric(agency.factor)
hist(Agency, breaks=100, main="Agency histogram", col="pink", xlab="Agencies")
```
### Agency Quantile-Quantile plot
```{r}
ggqqplot(Agency, title="Agency Quantile-Quantile plot", col="pink" )+theme(plot.title = element_text(hjust = 0.5))
```
Cities {data-navmenu="Task 1 Plots"}
=======================================================================
Row
-----------------------------------------------------------------------
### Text
```{r}
valueBox("Explanation", caption = "The histogram shows the frequency of each city (each city is represented by a number) across the country. Whilst the Q-Q plot shows how close the histogram is to having a normal distribution. We concluded that this graph was of a random distribution as it doesn't have a consistent form.", col="green")
```
Row {data-height=650}
-----------------------------------------------------------------------
### City histogram
```{r}
city <- c(dataInitial$city)
city.factor <- factor(city)
City <- as.numeric(city.factor)
names(City) <- dataInitial$city
hist(City, breaks=100, main="City histogram", col="green", xlab="Cities")
```
### City Quantile-Quantile plot
```{r}
ggqqplot(City, title="City Quantile-Quantile plot", col="green")+theme(plot.title = element_text(hjust = 0.5))
```
No. of bedrooms {data-navmenu="Task 1 Plots"}
=======================================================================
Row
-----------------------------------------------------------------------
### Text
```{r}
valueBox("Explanation", caption = "The histogram shows the frequency of the number of bedrooms in each house across the country. Whilst the Q-Q plot shows how close the histogram is to having a normal distribution. We concluded that this graph was of a poisson distribution.", col="red")
```
Row {data-height=650}
-----------------------------------------------------------------------
### Bedrooms histogram
```{r}
hist(dataInitial$bedrooms,xlim=c(0,17),breaks=18, main="Bedrooms histogram", col="red", xlab="No of Bedrooms")
```
### Bedrooms Quantile-Quantile plot
```{r}
ggqqplot(dataInitial$bedrooms, title="Bedrooms Quantile-Quantile plot", col="red")+theme(plot.title = element_text(hjust = 0.5))
```
Surface Area {data-navmenu="Task 1 Plots"}
=======================================================================
Row
-----------------------------------------------------------------------
### Text
```{r}
valueBox("Explanation", caption = "The histogram shows the frequency of the surface area across the country. Whilst the Q-Q plot shows how close the histogram is to having a normal distribution. We concluded that this graph was of a normal distribution if the heavy tails were excluded.", col="orange")
```
Row {data-height=650}
-----------------------------------------------------------------------
### Surface histogram
```{r}
hist(dataInitial$surface,xlim=c(0,1645),breaks=100, main="Surface histogram", col="orange", xlab="Surface Area")
```
### Surface Quantile-Quantile plot
```{r}
ggqqplot(dataInitial$surface, title="Surface Quantile-Quantile plot", col="orange")+theme(plot.title = element_text(hjust = 0.5))
```
Explanation Task 3 i, ii, iv, v, vi
=======================================================================
Row {data-height=400}
-----------------------------------------------------------------------
### Chart 1
```{r}
valueBox("Task i", caption = "In this section we aimed to find the means of the surface area, bedrooms, and house_price from a sample.
This was achieved by making use of Stratified sampling by dividing the data into groups according to their cities.
Depending on the number of houses for each city we took a respectable sample such that the sample would still represent the original data.
Afterwards we worked out the mean for the prices, bedrooms, and surface area for the sampled data set.
Our results consisted of:
Mean Surface Area: 75.62908
Mean Number of bedrooms: 2.726967
Mean House Price: 1421.123
", col="orangered")
```
### Chart 2
```{r}
valueBox("Task ii", caption = "In this section we aimed to find the two cities which had the highest and lowest living cost.
To do this we worked out the average price per square meter of each city in the data set and we compared said results to be able to retrieve the maximum and the minimum values.
According to our findings Beinsdorp is the most expensive city to live in with an average price per square meter of 44.05, whilst Wagenborgen is the cheapest city to live in with an average price per square meter of 0.3.", col="orange")
```
Row {data-height=400}
-----------------------------------------------------------------------
### Chart 3
```{r}
valueBox("Task iii", caption = "In this section we aimed to create a heatmap to show how the prices per meter squared of area fare across the country.
This was achieved by first importing a data set containing the longitude and latitude coordinates for to the Netherlands, then we retrieved the required values which would dictate the intensity of the heatmap, and we mapped all this data onto a map of the Netherlands whcih was created via the leaflet library.
Lastly, we also added a togglable set of markers on all relevant locations which when hovered over display the name and average house price of the city.
The functioning heat map can be seen in the Task 3 iii Heat Map section of this flexdashboard.
", col="orange")
```
### Chart 4
```{r}
valueBox("Task iv", caption = "The aim of this section was to find out the standard deviation and the distribution of the house prices in Amsterdam and Rotterdam, this was achieved by first retrieving the indexes of those rows, which had buildings located in Amsterdam and Rotterdam respectively.
Then we simply worked out the standard deviation of Amsterdam and Rotterdam by using the sd function.
Afterwards we plotted two histograms and worked out the means for Amsterdam and Rotterdam, to be able to conclude their distribution.
We thus, concluded that for the most part both are normally distributed but both have somewhat heavy tails which can skew the data.
This was also supported via the QQ-plot graphs.
Both the histograms as awell as QQ-plots can be seen in the Task 3 Plots section", col="orangered")
```
Row {data-height=400}
-----------------------------------------------------------------------
### Chart 5
```{r}
valueBox("Task v", caption = "The aim of this section was to identify and interpret correlations between the variables.
To do so we worked out the correlation between the prices and surface area of Amsterdam and Rotterdam and found out that both have a positive correlation but Amsterdam's correlation was stronger.
Amsterdam's correlation was 0.82 whilst that of Rotterdam was 0.79.
This implies that for both cities the surface increases as the price increases and vice-versa.
We also worked out the correlation between the house price and the number of bedrooms, as well as the house price and the surface area of the building.
We did this as we assumed that the number of bedrooms and the size of the house would in some way correlate to the house price.
Even though this was the case it turns out that the correlation of the former was 0.54 whilst that of the latter was 0.66 which is a moderate correlation.
In relation to the correlation between the bedrooms and the surface area we found a strong correlation of 0.73 which implies that a larger house is more likely to have more bedrooms and vice-versa.
", col="orangered")
```
### Chart 6
```{r}
valueBox("Task vi", caption = "In this section we were requested to predict the typical meter square apartment with 3 bedrooms in Amsterdam and Rotterdam as well as the monthly rent for a 125 meters squared apartment for the same two cities.
The final results were:
- Surface Area Amsterdam: 83.37
- 3 bedrooms Rotterdam: 83.44
- Monthly rent Amsterdam: 2811.23
- Monthly rent Rotterdam: 1885.51
", col="orange")
```
Task 3 iii Heat Map
=======================================================================
Row
-----------------------------------------------------------------------
### Text
```{r}
valueBox("Heat Map Average Price per City", caption = "The Heat map displays the Average price per square meter of each city.", col="lightblue")
```
Row {data-height=700}
-----------------------------------------------------------------------
### Heat Map
```{r}
library(ggplot2)
library(tidyverse)
library(maps)
library(leaflet)
library(leaflet.extras)
LatLng<-read.csv("~/GroupProject/locationC.csv")
Averages <- vector()
Cities <- vector()
map<-vector()
for (i in unique(dataInitial$city)){
tmp <- which(dataInitial$city==i)
Cities <- c(Cities,i)
Hprice <- c(dataInitial[tmp,1])
Surface <- c(dataInitial[tmp,5])
MP <- mean(Hprice)
MS <- mean(Surface)
Averages <- c(Averages,(MP/MS))
}
LatLng<-cbind(LatLng,"Averages"=NA)
for(i in 1:NROW(Cities)){
row = which(LatLng$city == Cities[i])
LatLng[row,10]<-Averages[i]
}
LatLng <- na.omit(LatLng)
map<-leaflet(LatLng,width="1000px",height="1000px")%>%
addTiles(group="city")%>%
setView(LatLng,lng = 4.895168, lat = 52.370216, zoom = 8)%>%
addHeatmap(group=dataInitial$house_price, lng=~LatLng$lng, lat=~LatLng$lat,
intensity=~LatLng$Averages, max=25, blur = 35)%>%
addMarkers(lng=LatLng$lng,lat=LatLng$lat,label=paste(LatLng$city,LatLng$Averages),
options = popupOptions(closeButton = TRUE),group="Name/Prices")%>%
addLayersControl(
overlayGroups= c("Name/Prices"),
options = layersControlOptions(collapsed = FALSE))
map
```
Amsterdam {data-navmenu="Task 3 Plots"}
=======================================================================
Row
-----------------------------------------------------------------------
### Text
```{r}
valueBox("Explanation", caption = "The histogram shows the frequency of the avergae price per square meter in Amsterdam. Whilst the Q-Q plot shows how close the histogram is to having a normal distribution.", col="brown")
```
Row {data-height=650}
-----------------------------------------------------------------------
### Amsterdam Histogram
```{r}
library(ggpubr)
Amsterdam <- vector()
Rotterdam <- vector()
AR <- which(dataInitial$city=="Amsterdam")#Contains the index of all
#the rows which have city Amsterdam
RR <- which(dataInitial$city=="Rotterdam")
for(i in AR){
Amsterdam <- c(Amsterdam, dataInitial[i,1])#Filling the vector with the respective prices
}
for(i in RR){
Rotterdam <- c(Rotterdam, dataInitial[i,1])
}
SdA <- sd(Amsterdam)#Working out the standard deviation and the mean
MA <- mean(Amsterdam)
SdR <- sd(Rotterdam)
MR <- mean(Rotterdam)
hist(Amsterdam,breaks = 70,main="Amsterdam",xlab="Prices",col="brown",xlim=c(400,17500))
```
### Amsterdam Quantile-Quantile plot
```{r}
ggqqplot(Amsterdam, title="Amsterdam Quantile-Quantile plot", col="brown")+theme(plot.title = element_text(hjust = 0.5))
```
Rotterdam {data-navmenu="Task 3 Plots"}
=======================================================================
Row
-----------------------------------------------------------------------
### Text
```{r}
valueBox("Explanation", caption = "The histogram shows the frequency of the avergae price per square meter in Rotterdam. Whilst the Q-Q plot shows how close the histogram is to having a normal distribution.", col="purple")
```
Row {data-height=650}
-----------------------------------------------------------------------
### Rotterdam Histogram
```{r}
hist(Rotterdam,breaks = 60,main="Rotterdam",xlab="Prices",col="purple",xlim=c(200,11500))
```
### Rotterdam Quantile-Quantile plot
```{r}
ggqqplot(Rotterdam, title="Rotterdam Quantile-Quantile plot", col="purple")+theme(plot.title = element_text(hjust = 0.5))
```